ci(bench-gpu): harden teardown, cap pairs at 32, fix CUDA comment by MauroToscano · Pull Request #736 · yetanotherco/lambda_vm

MauroToscano · 2026-06-29T18:03:34Z

Review follow-ups for #724, targeting its branch so they fold into that PR. Only .github/workflows/benchmark-gpu.yml changes.

Correctness / cost

Teardown can no longer leak a paid box. The instance id is recorded only after vastai create instance returns and its JSON parses, but teardown was gated solely on that id file. A box created in that window — a concurrency cancel (cancel-in-progress: true) landing right after create, or a --raw schema change breaking the parse — would exist, bill indefinitely, and teardown would print "nothing to destroy". Teardown now falls back to destroying by the unique RUN_LABEL (gpu-bench-<run_id>-<run_attempt>) when no id is recorded, tolerating both possible vastai show instances --raw shapes and only ever matching this run's own box.
Pairs capped at 32 (was 40), odd counts rounded up to even, timeout raised to 210 min. At 40 pairs the worst case (80 proves + slow provisioning + dual CUDA build) could exceed the old 180-min timeout and die after the expensive build — paying for the box but producing no result. 32 keeps the worst case within budget; rounding odd→even honours the AB/BA balanced design (the "even is ideal" comment was previously unenforced).

Docs

Fix the CUDARC_PIN comment. It claimed "the boxes are CUDA 13.0", which contradicts the actual pin cuda-12080 (CUDA 12.8) and the cuda_max_good>=12.8 offer floor. Corrected to ~12.8, and tied to the MIN_DRIVER>=580 guard as the opposite (too-old vs too-new) end of the same driver-compatibility window. The pin value itself was already correct — only the explanatory comment was wrong.

Hardening (defense-in-depth)

Don't dump the full create.json. Replaced cat create.json with a jq projection of only the needed fields, so an unexpected sensitive field in the --raw response can't land in the (collaborator-/world-readable) run log.
Validate the workflow_dispatch branch name against the git-ref-safe charset before it is interpolated into the remote bash -lc command. workflow_dispatch is write-access only, so this isn't a privilege gain — just avoids feeding an unvalidated ref into a remote shell.
Surface workflow_dispatch failures. The run-summary write lived at the end of the bench step, so it was skipped whenever the bench failed (set -e/pipefail) — and dispatch runs have no PR comment step, so a failed dispatch showed nothing but the raw log. Moved it to an always() step that prints the result on success and the log tail on failure.

Deliberately left as-is per discussion: template name (NVIDIA CUDA Lambda VM 64GB is correct), building/running PR code as root on the rented box (bounded — box holds no secrets, destroyed after, collaborator-gated), no global cost cap (cost acceptable given trusted-collaborator gating), and issues: write (matches bench-abba.yml).

Validated with actionlint (clean) and unit-tested the clamp/round, branch validation, and teardown jq filter logic.

Note: #724's description still says pairs clamp to [2,40] — worth updating to [2,32] there.

Review follow-ups on the GPU benchmark workflow: - Teardown: fall back to destroying by the unique RUN_LABEL when no instance id was recorded. The id file is written only after `create` succeeds and its JSON parses, so a box created in that window (concurrency cancel, or a parse failure) could otherwise leak and bill indefinitely. - Cap pairs at 32 (was 40) and round odd requests up to even (the AB/BA design wants even N); raise the job timeout to 210 min so a worst-case 32-pair run (64 proves + slow provisioning + dual CUDA build) fits without timing out after the expensive build. - Fix the CUDARC_PIN comment: the boxes are ~CUDA 12.8 (matching cuda-12080 and the cuda_max_good>=12.8 offer floor), not 13.0; tie it to the MIN_DRIVER guard as the opposite end of the same compatibility window. - Log only the needed fields of create.json instead of the full --raw response, so an unexpected sensitive field can't land in the run log. - Validate the workflow_dispatch branch name before it is interpolated into the remote `bash -lc` command. - Move the run-summary write into an always() step so workflow_dispatch failures are visible in the Actions summary rather than only the raw step log.

JuArce approved these changes Jun 29, 2026

View reviewed changes

JuArce merged commit 80b6963 into gpu_benchmarks Jun 29, 2026
11 checks passed

JuArce deleted the gpu-bench-review-fixes branch June 29, 2026 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(bench-gpu): harden teardown, cap pairs at 32, fix CUDA comment#736

ci(bench-gpu): harden teardown, cap pairs at 32, fix CUDA comment#736
JuArce merged 1 commit into
gpu_benchmarksfrom
gpu-bench-review-fixes

MauroToscano commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MauroToscano commented Jun 29, 2026

Correctness / cost

Docs

Hardening (defense-in-depth)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants